Toward Algorithmic Discovery of Biographical Information in Local Gazetteers of Ancient China
نویسندگان
چکیده
Difangzhi (地方志) is a large collection of local gazetteers complied by local governments of China, and the documents provide invaluable information about the host locality. This paper reports the current status of using natural language processing and text mining methods to identify biographical information of government officers so that we can add the information into the China Biographical Database (CBDB), which is hosted by Harvard University. Information offered by CBDB is instrumental for human historians, and serves as a core foundation for automatic tagging systems, like MARKUS of the Leiden University. Mining texts in Difangzhi is not easy partially because there is litter knowledge about the grammars of literary Chinese so far. We employed techniques of language modeling and conditional random fields to find person and location names and their relationships. The methods were evaluated with realistic Difangzhi data of more than 2 million Chinese characters written in literary Chinese. Experimental results indicate that useful information was discovered from the current dataset.
منابع مشابه
Mining and discovering biographical information in Difangzhi with a language-model-based approach
Background We present results of expanding the contents of the China Biographical Database by text mining historical local gazetteers, difangzhi地方志 . The goal of the database is to see how people are connected together, through kinship, social connections, and the places and offices in which they served. The gazetteers are the single most important collection of names and offices covering the S...
متن کاملA Comparative Study of the Metaphysical Basis of Ancient Iran-China Political Approach
Considering the cosmology with mythological form of consciousness era as the primary base of metaphysical form and the basis of development toward an integrated cosmology, political ideas has been placed in an organic link with a metaphysical system in the ancient Persians as well as Chinese political thought. Based on considerable similarities among cosmological systems in civilizations e.g....
متن کاملAn introduction to methods of discovering and identifying ancient sites with emphasis on evidence and geomorphologic techniques
Recognizing of position of ancient sites, it is of the great help to archaeologist. After this recognition, the archaeologist with rely on the knowledge and usual techniques in archaeology can determine the range of sites. After the discovery of this information, the archaeologist can get the information about the social, economic, livelihood and political of the past of sites. In this researc...
متن کاملProbabilistic Sufficiency and Algorithmic Sufficiency from the point of view of Information Theory
Given the importance of Markov chains in information theory, the definition of conditional probability for these random processes can also be defined in terms of mutual information. In this paper, the relationship between the concept of sufficiency and Markov chains from the perspective of information theory and the relationship between probabilistic sufficiency and algorithmic sufficien...
متن کاملDevelopment of an Efficient Hybrid Method for Motif Discovery in DNA Sequences
This work presents a hybrid method for motif discovery in DNA sequences. The proposed method called SPSO-Lk, borrows the concept of Chebyshev polynomials and uses the stochastic local search to improve the performance of the basic PSO algorithm as a motif finder. The Chebyshev polynomial concept encourages us to use a linear combination of previously discovered velocities beyond that proposed b...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2015